84 research outputs found
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
Research has shown that convolutional neural networks contain significant
redundancy, and high classification accuracy can be obtained even when weights
and activations are reduced from floating point to binary values. In this
paper, we present FINN, a framework for building fast and flexible FPGA
accelerators using a flexible heterogeneous streaming architecture. By
utilizing a novel set of optimizations that enable efficient mapping of
binarized neural networks to hardware, we implement fully connected,
convolutional and pooling layers, with per-layer compute resources being
tailored to user-provided throughput requirements. On a ZC706 embedded FPGA
platform drawing less than 25 W total system power, we demonstrate up to 12.3
million image classifications per second with 0.31 {\mu}s latency on the MNIST
dataset with 95.8% accuracy, and 21906 image classifications per second with
283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1%
and 94.9% accuracy. To the best of our knowledge, ours are the fastest
classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable
Gate Arrays, February 201
Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2× compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3×. It is also observed that the FPGA performs increasingly better as a vision application’s pipeline complexity grows
Effect of bone decalcification procedures on DNA in situ hybridization and comparative genomic hybridization. EDTA is highly preferable to a routinely used acid decalcifier
Decalcification is routinely performed for histological studies of
bone-containing tissue. Although DNA in situ hybridization (ISH) and
comparative genomic hybridization (CGH) have been successfully employed on
archival material, little has been reported on the use of these techniques
on archival decalcified bony material. In this study we compared the
effects of two commonly used decalcifiers, i.e. , one proprietary,
acid-based agent (RDO) and one chelating agent (EDTA), in relation to
subsequent DNA ISH and CGH to bony tissues (two normal vertebrae, six
prostate tumor bone metastases with one sample decalcified by both EDTA
and RDO). We found that RDO-decalcified tissue was not suited for DNA ISH
in tissue sections with centromere-specific probes, whereas we were able
to adequately determine the chromosomal status of EDTA-decalcified
material of both control and tumor material. Gel electrophoresis revealed
that no DNA could be successfully retrieved from RDO-treated material.
Moreover, in contrast to RDO-decalcified tumor material, we detected
several chromosomal imbalances in the EDTA-decalcified tumor tissue by CGH
analysis. Furthermore, it was possible to determine the DNA ploidy status
of EDTA- but not of RDO-decalcified material by DNA flow cytometry.
Decalcification of bony samples by EDTA is highly recommended for
application in DNA ISH and CGH techniques
Efficient Error-Tolerant Quantized Neural Network Accelerators
Neural Networks are currently one of the most widely deployed machine
learning algorithms. In particular, Convolutional Neural Networks (CNNs), are
gaining popularity and are evaluated for deployment in safety critical
applications such as self driving vehicles. Modern CNNs feature enormous memory
bandwidth and high computational needs, challenging existing hardware platforms
to meet throughput, latency and power requirements. Functional safety and error
tolerance need to be considered as additional requirement in safety critical
systems. In general, fault tolerant operation can be achieved by adding
redundancy to the system, which is further exacerbating the computational
demands. Furthermore, the question arises whether pruning and quantization
methods for performance scaling turn out to be counterproductive with regards
to fail safety requirements. In this work we present a methodology to evaluate
the impact of permanent faults affecting Quantized Neural Networks (QNNs) and
how to effectively decrease their effects in hardware accelerators. We use
FPGA-based hardware accelerated error injection, in order to enable the fast
evaluation. A detailed analysis is presented showing that QNNs containing
convolutional layers are by far not as robust to faults as commonly believed
and can lead to accuracy drops of up to 10%. To circumvent that, we propose two
different methods to increase their robustness: 1) selective channel
replication which adds significantly less redundancy than used by the common
triple modular redundancy and 2) a fault-aware scheduling of processing
elements for folded implementationsComment: 6 pages, 5 figure
- …